Social and moral psychology of COVID-19 across 69 countries

The COVID-19 pandemic has affected all domains of human life, including the economic and social fabric of societies. One of the central strategies for managing public health throughout the pandemic has been through persuasive messaging and collective behaviour change. To help scholars better understand the social and moral psychology behind public health behaviour, we present a dataset comprising of 51,404 individuals from 69 countries. This dataset was collected for the International Collaboration on Social & Moral Psychology of COVID-19 project (ICSMP COVID-19). This social science survey invited participants around the world to complete a series of moral and psychological measures and public health attitudes about COVID-19 during an early phase of the COVID-19 pandemic (between April and June 2020). The survey included seven broad categories of questions: COVID-19 beliefs and compliance behaviours; identity and social attitudes; ideology; health and well-being; moral beliefs and motivation; personality traits; and demographic variables. We report both raw and cleaned data, along with all survey materials, data visualisations, and psychometric evaluations of key variables.

. Sample size, average proportion of valid answers, age of respondents and the number of data collections in 69 countries (A-M). Note: Country = country names in accordance with ISO3 codes, N = number of respondents in each country. <50% and <90% = average proportion of valid (non NA) answers that are below 0.5 and 0.9 respectively in the subject level. μ Age = mean age and sd Age = standard deviation of the age, Multiple datasets = whether there were multiple data collections in the country. Tables 1, 2 show the number of participants, the mean proportion of non-missing 'valid' answers, and age. When multiple samples were collected within the same country, data were split into numbered subgroups (e.g., for Brazil, which has three samples, they were flagged as Brazil_1, Brazil_2 and Brazil_3). Multiple subsamples can be observed for Brazil, Canada, Colombia, India, Italy, Mexico and Romania. Note that in all the tables, we kept country subsamples separated to highlight they were collected by different teams, often using different sampling methodologies or languages, which impact their characteristics (e.g., representativeness). www.nature.com/scientificdata www.nature.com/scientificdata/ Demographic variables across countries are summarised in several tables: Tables 1, 2 show the number of participants, the mean proportion of non-missing 'valid' answers, and age. Tables 3, 4 illustrate the distribution of gender; Tables 5, 6 show employment status; and Tables 7-9 show marital status and number of children. When multiple samples were collected within the same country, data were split into numbered subgroups (e.g., for Brazil, which has three samples, they were flagged as Brazil_1, Brazil_2 and Brazil_3). Note that in the tables above, we kept country subsamples separated to highlight they were collected by different teams, often using different sampling methodologies or languages, which impact their characteristics (e.g., representativeness).
For the most part, participants were recruited via professional survey research companies and were incentivised to participate. In countries that, to our knowledge, did not possess polling infrastructure 30 , incentivising participants was not feasible. To collect data in these countries, leaders of national teams relied on online  Table 2. Sample size, average proportion of valid answers, age of respondents and the number of data collections in 69 countries (N-V). Note: Country = country names in accordance with ISO3 codes, N = number of respondents in each country. <50% and <90% = average proportion of valid (non NA) answers that are below 0.5 and 0.9 respectively in the subject level. μ Age = mean age and sd Age = standard deviation of the age, Multiple datasets = whether there were multiple data collections in the country. Tables 1, 2 show the number of participants, the mean proportion of non-missing 'valid' answers, and age. When multiple samples were collected within the same country, data were split into numbered subgroups (e.g., for Brazil, which has three samples, they were flagged as Brazil_1, Brazil_2 and Brazil_3). Multiple subsamples can be observed for Brazil, Canada, Colombia, India, Italy, Mexico and Romania. Note that in all the tables, we kept country subsamples separated to highlight they were collected by different teams, often using different sampling methodologies or languages, which impact their characteristics (e.g., representativeness).
Materials. The measures we used are illustrated in Figs. 4, 5 along with the specific items listed for each measure.
In most cases, participants' responses were collected on a scale from 0 = 'strongly disagree' to 10 = 'strongly agree' , with 5 = 'neither disagree nor agree' . In some cases, when more appropriate, we used other response scales (e.g., the generosity measure, where a 0-100% response scale was applied to hypothetical donations). In total, we  www.nature.com/scientificdata www.nature.com/scientificdata/ collected 98 unique variables and meta-data. To ensure participants' anonymity, no data that would allow their identification were collected.

COVID-19
Beliefs and compliance. Four constructs: COVID-19 public health support, COVID-19 risk perception, COVID-19 conspiracy theory beliefs, and COVID-19 testing behaviour. The public health support construct, in turn, is composed of three measures: spatial distancing, physical hygiene, and policy support. These are ad-hoc scales that we developed ourselves.
Identity and social attitudes. Three constructs: national identification 31 , national narcissism 32 , and social belonging 33 .
Ideology. One construct: political ideology. Participants self-reported their political orientation according to a single item on a scale from 0 ("Very left-leaning") to 10 ("Very right-leaning"). This measure has been shown to account for a significant proportion of the variance in voting intentions in American presidential elections between 1972 and 2004 34 and 2016 [35][36][37] . In fact, using a single-item scale to measure political ideology has been a common practice in political psychology literature, providing substantive evidence for the validity of the measure both across national and international research 38,39 . However, even if the symbolic ideology can be a useful and parsimonious instrument to study political attitudes, when interpreting results, users should be attentive to the political and cultural applicability, psychometric validity, and generalisability of measures of political ideology [40][41][42] .  Table 4. Distribution of sex in 69 countries (N-V). Note: Country = country names in accordance with ISO3 codes, % Female = Proportion of female respondents in the country, % Male = proportion of male respondents, % Other = proportion of non-binary respondents and % NA = proportion of the unreported sex.
Demographics. Six questions: age, number of children, employment status, marital status, gender, and urbanicity.
Metadata and attention check. An attention check was used to mitigate negative impact on data quality from potential non-human responses and the likelihood of biasing data and subsequent analysis of low base-rate outcomes-such as endorsement of COVID-19 conspiracies. We collected typical questionnaire metadata (e.g., start, record, and end dates, duration, and language). In addition, we created an internal participant ID, added ISO2 and ISO3 country codes, and sample representativeness.   www.nature.com/scientificdata www.nature.com/scientificdata/ Translation. The survey instrument was drafted in English and translated into other languages using the standard forward-backward method (i.e., members of national teams were advised to split members into forward-translating the survey into the local language and back-translating it into English, and then have the two groups discuss and resolve discrepancies). In total, the survey instrument was translated into 32 languages, including adaptations of region-specific dialects or vernaculars. Specifically, from English into Arabic, Bengali, Bulgarian, Croatian, Danish, Dutch, Finnish, French, German, Greek, Hebrew, Hungary, Italian, Japanese, Korean, Kurdish, Latvian, Macedonian, Mandarin simplified, Mandarin traditional, Nepali, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, and Ukrainian (see osf.io/tfsza at sub-folder Translations).
Data cleaning. We received individual data files from each national team. To merge these raw data, minor modifications were introduced, which we delineate in this section. First, we renamed columns to match across data sets, reordered variables alphabetically, and standardised variable labels. Furthermore, all missing values and values denoting the absence of a response were converted to NAs (not available). When ambiguous date formats were found (e.g., on start date, end date, and record date), we manually specified the correct format and standardised them. At the second stage, we introduced multiple modifications to clean the data for research. Some modifications were introduced to every national data set, while others were introduced to specific national data sets   www.nature.com/scientificdata www.nature.com/scientificdata/ (both of which are thoroughly reported in the Data Records section). To each national data set, we recoded the attention check (attcheck) into pass (1) or fail (0); standardised generosity items (generosity1-3), recoded CRT items into intuitive (2), correct (1), and incorrect (0); converted the number of children (children) into a variable with a fixed range from zero to ten or more; recoded all participants declaring being older than 100 years old as 100; and we excluded all duplicates (i.e., in case multiple participants were recorded with identical inputs within a national database, only the first input was retained).

Data records
All materials associated with the ICSMP COVID-19 project can be found on the project's repository (comprising five folders) hosted by the Open Science Framework (OSF, https://doi.org/10.17605/osf.io/tfsza) 56,57 . The folder named Code includes an R Markdown document (ICSMP official data.Rmd; osf.io/dwpng) that loads multiple data files (from each national team), cleans them up, merges them into a single data file, generates a data-driven code-book, and saves all outputs. It also includes a reproducible report with all reported numbers, analyses and graphs in this article (Analyses-SciData.html; osf.io/s5c4p; Analyses SciData.Rmd; osf.io/9suyb). The folder named Data includes three sub-folders. The Raw data sub-folder contains the original and unmodified data files from each national team (country data files.zip; osf.io/dqmut). The sub-folder named Cleaned data contains the merged and cleaned dataset, which is provided in a non-proprietary (ICSMP_cleaned_data.csv; osf.io/ypkrc) and a labelled (ICSMP_cleaned_data.sav; at osf.io/8tyj9) file formats. In addition, we included in a sub-folder a dataset that removes observations failing the attention check or filled out less than 50% of the items, both in a non-proprietary (ICSMP_cleaned_data_nobots.csv; osf.io/98fex) and a labelled (ICSMP_cleaned_ data_nobots.sav; at osf.io/3yjga) file formats. The Metadata sub-folder provides a thorough itemised description   www.nature.com/scientificdata www.nature.com/scientificdata/ of the data cleaning process in both text (Data Cleaning.docx; osf.io/7udpt) and human-readable change-log (human-readable change log ICSMP.xlsx; osf.io/fydx2).
We also provide a data-driven codebook detailing how each measure was collected-e.g., listing variable names, variable labels, and label values (dt.codebook.xlsx; osf.io/ecva2). The IRB folder contains both the Internal Review Board Ethics application (ICSMP Kent Ethics application full.pdf; osf.io/xt9gr) and Ethics approval (ICSMP Kent Ethics approval.pdf; osf.io/ce638). The folder Sample Type & Representativeness includes the documentation for an internal survey conducted with national team leaders about the employed survey methodology for the data provided (Sample Type & Representativeness.zip; osf.io/fj5xn). The folder Survey Instrument contains the initial English version of our survey instrument along with its Qualtrics.qsf for reproducibility (Survey Instrument.zip; osf.io/nf48q). In the sub-folder Translations, we archived all 32 translated survey instruments along with a report on the languages of conducted surveys per country (i.e., several countries had their surveys in multiple languages per country; Country and language.xlsx; osf.io/wj7d2).
Potential for future research. The data contains four measures of COVID-19 beliefs and compliance, 17 social and moral psychological constructs, and six sociodemographic characteristics, amounting to 27 socially-relevant variables. To quantify the potential of this dataset-and assuming a typical research paper uses between three to five key main constructs plus sociodemographics and controls-we calculated the number of combinations of 17 constructs, taken three, four, and five at a time, yielding a grand total of 9248 possible unique designs. As a demonstration of the broad scope of the ICSMP data, published studies cover a broad range of psychological disciplines, including social psychology 13,14 , cognitive psychology 15,17 , political psychology 16 , moral psychology 16,18 , economic psychology 19 and health sciences 20 , among others. They explore different populations in reference to the COVID-19 pandemic in terms of age (e.g., older adults see 21 , marital status 19 or nationality (e.g., for a study on the Spanish population, see; 22 for Swedish and Chinese population see 23 ), and other socio-demographic characteristics. These all attest to the great potential of the ICSMP data to inspire further research. In sum, the present dataset affords numerous opportunities for cross-cultural research on a plethora of hypotheses. We encourage researchers who consider reusing ICSMP data to examine the list of pre-registrations before beginning a new project so as to avoid duplication (see icsmp-covid19.netlify.app/preregistration).

Data visualisation interface.
In addition to the raw data, a dedicated Web application was developed to   www.nature.com/scientificdata www.nature.com/scientificdata/ for the readers, both tables and figures are downloadable. The Shiny app has two tabs giving general information about the project and the international consortium. The first tab contains sample descriptions such as sample size, missing data, and attention checks for each country with a Gantt chart showing the dates of data collection. The second tab displays world maps of spatial distancing, policy support, national identity, conspiracy beliefs, national narcissism and morality as cooperation as well as all tables reported in dynamic formats.

Technical Validation
To support the technical quality of the dataset, we conducted an analysis to showcase its reliability (and its diverse applicability to research questions in social sciences and beyond). For completeness, in the analyses that follow, we examined all samples-including those with very few observations, such as Puerto Rico (N = 2), Brazil_3 (N = 6), and Panama (N = 12).
We evaluated the adopted survey methodology utilised by national teams by conducting an internal survey to ensure the accuracy of reported sample types. The inspection showed that 28 samples were quota-based nationally representative samples (36%), 6 used post hoc weights to achieve an approximate level of national representation (8%) which nonetheless should be seen as convenience samples, and 43 were convenience samples www.nature.com/scientificdata www.nature.com/scientificdata/ (56%), many of which were from low and middle-income countries 60 . We codified the results of this survey into the cleaned data as the variable 'sample_coding' and present a summary in Table 10. National representativeness for the 28 quota-based samples relate to an approximation of the demographic characteristics of age and gender only for each country.
Regarding individual-level data quality, Fig. 6 shows a world map of the 69 countries from which data were collected, coloured according to overall percentages of missing data (overall mean = 6.0%). Overall, 95.6% of participants had less than 50% missing data, 92.8% participants had less than 10% missing data, and 24.7% of participants had 0% missing data. Another indicator of data quality is the rate of attention check fails per country. On the last screen of the survey, participants were given the following instructions: "Help us get rid of bots: Please write the number 213 into the comment box. " Participants who wrote "213" were coded as passing the attention check, participants who wrote anything else were coded as failing the attention check, and those who did not reach this screen of the survey were coded as missing data.   AE, BD, BE, BG, BR_2, BR_3,  CO_1, FI, GH, GR, IE, IN_1,  IN_2, IT_2, IQ, CO_2, AR,  CL, MX_2, PE, VE, CR, PY,  BR_3, EC, GT, UY, BO, SV Table 10. Overview of the samples.

Fig. 6
Data quality indicators for each surveyed country. Note: The percentage of missing data considered all the questions in the survey (i.e., all sociodemographics and psychological scales"). We calculated, for each country, the mean of the participants' proportion of missing data across all survey questions, including sociodemographics (this information is also provided in our reproducible report of Fig. 6, where the R code is provided).
www.nature.com/scientificdata www.nature.com/scientificdata/ others, and 4.8% unreported). The employment status breakdown shows 44.8% employed full-time, 10.6% part-time, 8.1% unemployed, 10% students, 10.1% retired, 11% other, and 5.3% unreported. The overall marital status shows 33% of respondents were single, 18.7% in a relationship, 42.7% married, and 5.5% unreported. The majority of our participants reported having no children (41.6%), with 16.7% having one child, 20.1%, 9.2%, and 3.9% with two, three and four children, respectively, and 1.7% had five or more children (6.9% unreported). We break down these aggregated results per country. Tables 1, 2 show the number of cases and valid answers,  Table 3,4 summarises the distribution of sex, Tables 5, 6 display employment status, and Tables 7-9 illustrate both marital statuses and the number of children.
We also examined cross-cultural differences in conspiracy beliefs, morality as cooperation, spatial distancing, national narcissism, national identification, and policy support for preventative measures across 69 countries in Fig. 7. Additionally, we showcase patterns of associations between these moral and psychological constructs across gender, ideology and age in Figs. 8, 9. For the association pattern analysis, we excluded samples with less than 490 respondents as recommended for stable correlations 61 , as well as for the subsequent consistency measure analysis.
To examine internal consistency for the main scales, we calculated Cronbach's Alpha, Omega, Guttman split-half reliability, and proportion of variance explained by a unidimensional factor. This table is available at osf.io/ed7yg and shows indices of internal consistency by country for measures of conspiracy beliefs, morality as cooperation, spatial distancing, national narcissism, national identification, and policy support for preventative Conspiracy Beliefs -participant's beliefs in conspiracy theories regarding COVID-19; Morality as Cooperationparticipant's moral concern based on the morality-as-cooperation theory; Spatial Distancing -participant's support for spatial distancing as a strategy against COVID-19; Collective Narcissism -participant's narcissism, i.e., an inflated view regarding their ingroup (in this research we focused on nationality); National Identityparticipant's identity attached to belonging to a nation; Policy Support -participant's support to public policies (e.g., closing parks or schools) as a strategy against COVID- 19. www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ measures, respectively. We found that the spatial distancing construct, on average, has the lowest Cronbach's alpha, followed by morality as cooperation. On average, conspiracy beliefs have the highest Cronbach's alpha, followed by policy support. These patterns hold for the Omega measures, but when considering Guttman's split-half reliability, collective narcissism and national identity yield the lowest values. Figures 9-15 show these patterns visually.

Usage Notes
The datasets are shared, cleaned, and ready for analysis. We recommend that interested researchers use the cleaned version of the data (available at https://doi.org/10.17605/osf.io/tfsza) 56 . The use of the labelled data is also suggested for convenience as it has all variable levels encoded, thus eliminating the need to consult the codebook when using the.csv format.  www.nature.com/scientificdata www.nature.com/scientificdata/ The Data were imported and cleaned using the R software for statistical analysis 62 and packages readr 63 , haven 64 , readxl 65 , dplyr 66 , psych 67 , htmltools 68 , mime 69 , xfun 70 , labelled 71 , sjlabelled 72 , codebook 73 , lubridate 74 .
To minimize misclassification of text-based responses to the cognitive reflection test (CRT) and the attention check, we used multiple steps of data cleaning using REGEX (regular expressions) as fully detailed in (ICSMP official data.Rmd; osf.io/dwpng) located in the folder named Code. First, we coded the predefined numerical and  www.nature.com/scientificdata www.nature.com/scientificdata/ text values as correct (in the case of CRT, also the values predefined as intuitive). Then, iteratively, we screened the remaining responses and, using REGEX, updated answers. Remaining responses were recoded as incorrect.

Code availability
All raw and cleaned data-as well as the R-code-used for standardising national-teams data, merging, and cleaning them are available at https://doi.org/10.17605/osf.io/tfsza 56 .